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The  UMass  RADIUS  Project 


1  Introduction 

The  Research  and  Development  for  Image  Underst8in<ling  Systems  (RADIUS)  project  is 
a  nationzd  effort  to  apply  image  understanding  (lU)  technology  to  support  model-based 
aerial  image  amalysis  [5].  Automated  construction  and  management  of  3-D  geometric  site 
models  enables  efficient  exploitation  of  the  tremendous  volume  of  information  collected  daily 
by  national  sensors.  The  expected  benefits  zu:e  decreased  work-load  on  human  analysts, 
together  with  an  increase  in  measurement  accuracy  because  of  the  introduction  of  digital 
lU  and  photogrammetric  techniques.  When  properly  annotated,  automatically  generated 
site  models  can  provide  the  spatial  context  for  specialized  lU  analysis  tasks  such  as  vehicle 
counting,  change  detection,  and  damage  assessment,  while  graphical  visualization  techniques 
using  3-D  site  models  are  valuable  for  training  and  mission  planning.  Civilian  benefits  of 
this  technology  are  also  numerous,  including  automated  cartography,  land-use  surveying  and 
urban  planning. 

Over  the  past  three  years,  the  University  of  Massachusetts  (UMass)  has  developed  tech¬ 
niques  to  automatically  populate  a  site  model  with  3-D  building  models  extracted  from 
multiple,  overlapping  images.  There  are  many  technical  challenges  involved  in  developing  a 
building  extraction  system  that  works  reliably  on  the  type  of  images  being  considered  under 
RADIUS.  Multiple  images  of  the  scene  may  be  captured  by  different  cameras  from  arbitrary 
viewing  positions,  and  images  may  be  collected  months  or  even  yeeirs  apart,  under  vastly 
different  weather  and  lighting  conditions.  There  is  typically  a  lot  of  clutter  surrounding 
buildings  (vehicles,  pipes,  oil  drums,  shrubbery)  and  on  top  of  them  (roof  vents,  air  condi¬ 
tioner  units,  ductwork),  buildings  often  occluding  each  other  in  obfique  views,  and  shadows 
falling  across  building  faces  break  up  low-level  extracted  features  such  as  line  segments  and 
regions.  To  overcome  these  difficulties,  the  UMass  design  philosophy  incorporates  several 
key  ideas.  First,  3-D  reconstruction  is  based  on  geometric  features  that  remain  stable  under 
a  wide  range  of  viewing  and  lighting  conditions.  Second,  rigorous  photogrammetric  camera 
models  are  used  to  describe  the  relationship  between  pixels  in  an  image  and  3-D  locations  in 
the  scene,  so  that  diverse  sensor  characteristics  and  viewpoints  can  be  effectively  exploited. 
Third,  information  is  fused  across  multiple  images  for  increased  accuracy  and  reliability.  Fi¬ 
nally,  known  geometric  constraints  eue  applied  whenever  possible  to  increase  the  efficiency 
and  reliability  of  the  reconstruction  process. 

This  report  is  organized  as  follows.  Section  2  presents  an  overview  of  the  Automated 
Site  Construction,  Extension,  Detection  and  Refinement  (ASCENDER)  system,  designed  to 
automatically  acquire  models  of  buildings  with  flat,  rectilinear  rooftops.  Ascender  is  the 
primziry  deliverable  of  the  3-year  UMass  RADIUS  effort,  and  is  currently  being  evzduated  on 
classified  imagery  at  Lockheed-Meirtin,  Section  3  presents  results  of  an  evaluation  conducted 
at  UMass  on  an  unclassified  data  set  of  Ft.  Hood,  Texas.  The  system  is  being  extended  via 
new  strategies  for  acquiring  models  of  other  common  building  classes,  such  as  peaked  and 
multi-level  roof  structures,  which  are  described  in  Section  4  and  in  [6].  Section  5  outlines 
recent  advances  in  the  symbolic  extraction  of  surface  details,  such  as  windows  and  doors. 
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and  their  applications  to  graphical  rendering  for  scene  visualization. 


2  The  Ascender  System 


The  Ascender  system  has  been  designed  to  automaticadly  populate  a  site  model  with  build¬ 
ings  extracted  from  multiple,  overlapping  images  exhibiting  a  variety  of  viewpoints  and  sun 
angles.  In  mid-April  1995,  Version  1.0  of  the  Ascender  system  was  delivered  to  Lockheed- 
Martin  for  testing  on  classified  imagery  and  for  integration  into  the  RADIUS  Testbed  System 
[5].  At  the  same  time,  an  informed  transfer  was  made  to  the  National  Exploitation  Labora¬ 
tory  (NEL)  for  familiarization  and  additional  testing.  This  section  presents  a  brief  overview 
of  the  Ascender  system  and  its  approach  to  extracting  building  models.  More  detailed  de¬ 
scriptions  can  be  found  in  [2,  3,  4].  Some  sample  building  models  automatically  generated 
by  the  Ascender  system  are  shown  in  Figures  1  and  2. 


Figure  1:  Sample  building  model  automatically  generated  by  the  Ascender  system. 


2.1  System  Overview 

Ascender  was  developed  on  a  Sun  Sparc  20,  using  the  Radius  Common  Development  En¬ 
vironment  (RODE)  [8].  The  RODE  is  a  combined  Lisp/C-f-f  system  that  supports  the 
development  of  image  understanding  algorithms  for  constructing  and  using  site  models.  The 
RCDE  provides  a  convenient  framework  for  representing  and  manipulating  images,  camera 
models,  object  models  and  terredn  models,  and  for  keeping  track  of  their  various  coordinate 
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Figure  2:  Some  additionail  samples  of  building  models  generated  by  Ascender. 

systems,  inter-object  relationships,  and  transformation/projection  equations.  To  be  more 
specific,  the  following  items  needed  by  Ascender  cire  managed  by  the  RCDE  and  assumed  to 
be  present  before  the  building  extraction  process  begins: 

•  Images.  A  set  of  images,  both  nadir  and  oblique,  that  view  the  same  area  of  the  site. 
Best  results  are  obtained  with  images  exhibiting  a  variety  of  viewing  and  sun  angles. 

•Site  Coordinate  System.  A  Euclidean,  local-vertical  coordinate  system  (Z-axis  points 
up)  for  representing  building  models. 

•Camera  Models.  A  specification  of  how  3-D  locations  in  the  site  coordinate  system 
are  related  to  2-D  image  pixels  in  each  image.  One  common  camera  representation  is  a  3  x  4 
projective  transformation  matrix  encoding  the  lens  cind  pose  parameters  of  each  perspective 
camera.  Ascender  also  can  handle  the  fast  block  interpolation  projection  (FBIP)  camera 
model  used  in  the  RCDE  to  represent  the  geometry  of  non-perspective  cameras. 

•Digital  Terrain  Map.  A  specification  of  the  terraiin  underlying  the  site.  This  could 
be  as  simple  as  a  plane  equation,  or  could  be  a  full  array  of  elevation  values  computed  via 
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correlation-based  stereo. 


2.2  The  Building  Extraction  Process 

The  Ascender  system  uses  a  straightforward  control  strategy  to  extract  building  models. 
The  process  is  described  briefly  here,  with  particulcir  attention  given  to  the  algorithmic 
paxameters  that  can  be  set  by  the  user  to  vary  the  number  and  quality  of  the  resulting 
building  hypotheses. 

Building  detection  begins  by  extracting  straight  line  segments  using  the  Boldt  algorithm 
[1].  Intensity  edges  are  grouped  recursively  into  longer  straiight  lines  with  subpixel  accuracy 
via  a  set  of  Gestalt  perceptual  organization  criteria.  Two  user  thresholds,  minimum  line 
length  and  minimum  contrast  (gray-level  difference  across  the  line),  axe  avzdlable  to  control 
the  set  of  lines  returned. 

Two-dimensional  building  roof  boundaries  cire  hypothesized  from  extracted  image  line 
segments  via  a  graph-based  perceptual  grouping  algorithm  [7].  Lines  segments  are  grouped 
into  corners,  chains,  and  eventually  into  complete  closed  polygons.  A  single  variable  sensitiv¬ 
ity  parameter  ranging  from  0.0  (very  low  sensitivity)  to  1.0  (very  high)  controls  the  settings 
of  several  less-intuitive  internal  parameters  that  govern  the  polygon  grouping  process. 

The  recovery  of  3-D  building  information  begins  by  estimating  a  height  for  each  hy¬ 
pothesized  2-D  roof  polygon  via  multi-image  epipoleu  matching.  This  estimate  is  chosen  as 
the  peak  in  a  height  histogram  formed  by  matching  the  polygon’s  edges  to  line  segments 
in  multiple  images  and  JiUowing  each  potential  match  to  vote  for  a  height  range.  The  size 
of  the  epipolar  search  region  in  each  image  is  governed  by  two  peirameters:  the  minimum 
and  maximum  Z-vedues  that  building  rooftops  could  be  found  at  (the  minimum  Vcdue  could 
potentially  be  determined  from  an  accurate  terrain  map).  A  third  pzirameter  that  governs 
the  search  for  correspondences  is  the  expected  residual  error  (in  pixels)  between  true  and 
observed  2-D  feature  locations,  roughly  summsirizing  the  level  of  error  in  image  features 
caused  by  inaccuracies  in  the  camera  resection  and  feature  extraction  routines. 

After  a  set  of  matching  line  segments  for  the  building  roof  is  found,  a  rigorous  pho- 
togrammetric  triangulation  procedure  is  performed  to  determine  the  precise  3-D  size,  shape 
and  position  of  the  building  rooftop.  The  optimization  criterion  simultaneously  minimizes 
the  sum-of-squared  residusd  errors  between  projected  3-D  roof  polygon  edges  and  correspond¬ 
ing  line  segment  features  in  all  the  images.  There  Eire  no  user  parameters.  The  resulting 
3-D  polygon  is  then  extruded  down  to  the  provided  terrain  to  form  a  complete  building 
wireframe. 


3  Evaluation  on  Ft.  Hood  Imagery 

The  success  of  the  Ascender  system  will  ultimately  be  judged  by  its  performance  on  classified 
imagery.  Such  tests  are  currently  being  performed  at  Lockheed-Martin.  In  parallel  with  that 
effort,  UMass  is  performing  an  in-depth  system  evaluation  using  unclassified  data.  The  set 
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of  experiments  8a:e  designed  to  address  questions  like: 

1.  How  is  the  rooftop  detection  rate  related  to  system  sensitivity  settings? 

2.  Is  the  detection  rate  affected  by  viewpoint  (nadir  vs.  oblique)? 

3.  Does  2-D  detected  polygon  accuracy  Vciry  by  viewpoint? 

4.  Is  2-D  accuracy  related  to  sensitivity  settings? 

5.  How  does  3-D  accuracy  Vciry  with  the  number  of  images  used? 

6.  How  does  3-D  accuracy  vary  according  to  2-D  accuracy  of  the  hypothesized  polygons? 


This  section  presents  evaluation  results  on  a  Izurge  data  set  from  Ft.  Hood,  Texas.  The 
imagery  was  collected  by  Photo  Science,  Inc.  (PSI)  in  October  1993  and  scanned  at  the 
Digital  Mapping  Laboratory  at  CMU  in  Jan-Feb.  1995.  Camera  resections  were  performed 
by  PSI  for  the  nadir  views,  and  by  CMU  for  the  obliques. 

3.1  Methodology 

An  evaluation  data  set  was  cropped  from  the  Ft.  Hood  imagery,  yielding  seven  subimages 
from  the  views  labeled  711,  713,  525,  927,  1025,  1125  and  1325  (images  711  and  713  are 
nadir  views,  the  rest  are  obliques).  Table  1  summarizes  the  ground  sample  distance  (GSD) 
for  each  image.  The  region  of  overlap  covers  an  evaluation  mea  of  roughly  760  x  740  me¬ 
ters,  containing  a  good  blend  of  both  simple  and  complex  roof  structures.  Thirty  ground 
truth  building  models  were  created  by  hand  using  interactive  modelling  tools  provided  by 
the  RCDE.  Each  building  is  composed  of  RCDE  “cube”,  “house”  and/or  “extrusion”  ob¬ 
jects  that  were  shaped  and  positioned  to  project  as  well  as  possible  (as  determined  by  eye) 
simultaneously  into  the  set  of  seven  images.  The  ground  truth  data  set  is  shown  in  Figure  3. 


Table  1:  Ground  sample  distances  (GSD)  in  meters  for  the  seven  evaluation  images.  A  GSD 
of  0.3  means  that  a  length  of  1  pixel  in  the  image  roughly  corresponds  to  a  distance  of  0.3 
meters  as  measured  on  the  ground. 


711 

713 

525 

927 

1025 

1125 

1325 

0.31 

0.31 

0.61 

0.52 

1.10 

1.01 

1.01 

Since  the  Ascender  system  explicitly  recovers  only  rooftop  polygons  (the  rest  of  the 
building  wireframe  is  formed  by  verticcil  extrusion),  the  evaluation  is  based  on  comparing 
detected  2-D  and  triangulated  3-D  roof  polygons  vs.  their  ground  truth  counterpeirts.  There 
axe  73  ground  truth  rooftop  polygons  among  the  set  of  30  buildings.  Ground  truth  2-D 
polygons  for  each  image  axe  determined  by  projecting  the  ground  truth  3-D  polygons  into 
that  image  using  the  known  camera  projection  equations. 
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Figure  3:  Ft.  Hood  evailuation  area  with  30  ground  truth  building  models  composed  of 
single-  and  multi-level  flat  roofs,  and  two  peaked  roofs.  There  are  73  roof  facets  in  all.  The 
size  of  the  image  area  shown  is  2375  x  1805  pixels. 

The  Center-Line  Distance  measures  how  well  two  arbitrciry  polygons  match  in  terms  of 
size,  shape  and  location^.  The  procedure  is  to  oversample  the  boundary  of  one  polygon 
into  a  set  of  equally  spaced  points  (several  thousand  of  them).  For  each  point,  measure  the 
minimum  distance  from  that  point  to  the  other  polygon  boundciry.  Repeat  the  procedure  by 
oversampling  the  other  polygon  and  measuring  the  distance  of  each  point  to  the  first  polygon 
boundary.  The  center-line  distance  is  taken  as  the  average  of  aU  these  values.  This  metric 
provides  a  measure  of  the  average  distance  between  the  two  polygons  boundaries,  reported 
in  pixels  for  2-D  polygons,  and  in  meters  for  3-D  polygons.  We  prefer  the  center-line  distance 
to  other  comparison  measures,  such  as  the  one  used  in  [9],  since  it  is  very  easy  to  compute 
and  can  be  applied  to  two  polygons  that  do  not  have  the  same  number  of  vertices. 

For  polygons  that  have  the  same  number  of  vertices,  and  are  fciirly  close  to  each  other 
in  terms  of  center-line  distance,  8in  additional  distance  measure  is  computed  between  corre¬ 
sponding  pairs  of  vertices  between  the  two  polygons.  That  is,  for  each  polygon  vertex,  the 
distance  to  the  closest  vertex  on  the  other  polygon  is  measured.  For  2-D  polygons  these 
Inter-Vertex  Distances  2ire  reported  in  pixels,  for  3-D  polygons  the  units  axe  meters,  and  the 
distances  Eu:e  broken  into  their  planimetric  (distance  pcurallel  to  the  X-Y  plane)  vs.  altimetric 
(distance  in  Z)  components. 

*^Robert  Haralick,  private  communication,  1996. 
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3.2  Evaluation  of  2-D  Detection 


One  important  module  of  the  Ascender  system  is  the  2-D  polygonal  rooftop  detector.  The 
detector  was  tested  on  images  711,  713,  525  and  927  to  see  how  well  it  performed  at  different 
grouping  sensitivity  settings,  and  with  different  length  and  contrast  settings  of  the  Boldt  line 
extraction  zilgorithm.  The  detector  was  tested  by  projecting  each  ground  truth  roof  polygon 
into  an  image,  growing  its  2-D  bounding  box  out  by  20  pixels  on  each  side,  then  invoking  the 
building  detector  in  that  region  to  hypothesize  2-D  rooftop  polygons.  The  evaluation  goals 
were  to  determine  both  true  and  feJse  positive  detection  rates  when  the  building  detector 
was  invoked  on  an  area  containing  a  budding,  and  to  measure  the  2-D  accuracy  of  the  true 
positives. 

3.2.1  Detection  Rates 

The  polygon  detector  typically  produces  severed  roof  hypotheses  within  a  given  image  area, 
particularly  when  run  at  the  higher  sensitivity  settings.  Thus,  determining  true  and  false  pos¬ 
itive  detection  rates  involves  determining  whether  or  not  each  hypothesized  image  polygon 
is  a  good  match  with  some  ground  truth  projected  roof  polygon.  To  automate  the  process 
of  counting  true  positives,  each  hypothesized  polygon  was  ranked  by  its  center-line  distance 
from  the  known  ground  truth  2-D  polygon  that  was  supposed  to  be  detected.  Of  cdl  hypothe¬ 
ses  with  distances  less  than  a  threshold  (i.e.  polygons  that  were  reasonably  good  matches  to 
the  ground  truth),  the  one  with  the  smadlest  distance  was  counted  as  a  true  positive;  cdl  other 
hypotheses  were  considered  to  be  false  positives.  The  threshold  vedue  used  was  0.2  times  the 

square  root  of  the  eurea  of  the  ground  truth  polygon,  that  is:  Dist(hyp,gt)  <  0.2^Area(gt), 
where  “hyp”  and  “gt”  cure  hypothesized  and  ground  truth  polygons,  respectively.  This  em¬ 
pirical  threshold  allows  2  pixels  total  error  for  a  square  with  sides  10  pixels  long,  and  varies 
linearly  with  the  scede  of  the  polygon. 

The  total  numbers  of  roof  hypotheses  generated  for  images  711,  713,  525  and  927  eure 
shown  at  the  top  of  Figure  4  for  nine  different  sensitivity  settings  of  the  building  detector 
ranging  from  0.1  to  0.9  (very  low  to  very  high).  The  line  segments  used  for  each  image  were 
computed  by  the  Boldt  algorithm  using  length  and  contrast  thresholds  of  10.  The  second 
graph  in  Figure  4  plots  the  number  of  true  positive  hypotheses.  For  the  highest  sensitivity 
setting,  the  percentage  of  rooftops  detected  in  711,  713,  525  and  927  were  51  percent,  59 
percent,  45  percent  and  47  percent,  respectively.  The  graph  also  shows  the  number  of 
true  positives  achieved  by  combining  the  hypotheses  from  all  four  images,  either  by  pooling 
hypotheses  computed  separately  for  each  image,  or  by  recursively  masking  out  previously 
detected  buildings  and  focusing  on  the  unmodeled  areas  in  each  new  image  [2].  For  the 
highest  sensitivity  setting,  this  strategy  detects  81  percent  (59  out  of  73)  of  the  rooftops  in 
the  scene. 

The  detection  rates  seem  to  be  sensitive  to  viewpoint.  More  toted  hypotheses  and  more 
true  positives  were  detected  in  the  nadir  views  than  in  the  obliques.  This  may  represent  a 
property  of  the  building  detector,  but  it  also  is  likely  that  most  of  the  discrepancy  is  caused 
by  the  difference  in  GSD  of  the  images  for  this  cirea  (see  Table  1).  Each  building  roof  occupies 
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Figure  4:  Top:  Building  detector  sensitivity  vs.  total  number  of  roof  hypotheses.  Bottom: 
Sensitivity  vs.  number  of  true  positives.  Horizontad  lines  show  the  actuad  number  of  ground 
truth  polygons.  Combining  results  from  adl  four  views  yields  a  “best”  detection  rate  of  81 
percent  with  Hnes  of  length  >  10,  and  97  percent  with  hnes  of  length  >  5. 
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a  laxger  set  of  pixels  in  the  nadir  views  than  in  the  obhques,  for  this  data  set. 

To  measure  the  best  possible  performance  of  the  rooftop  detector  on  this  data,  it  was 
run  on  aU  four  images  at  sensitivity  level  0.9,  using  Boldt  line  data  computed  with  length 
and  contrast  thresholds  of  5.  These  were  judged  to  be  the  highest  sensitivity  levels  for  both 
line  extractor  and  building  detector  that  were  feasible,  and  the  results  represent  the  best 
job  that  the  building  detector  can  possibly  do  with  each  image.  The  percentages  of  rooftops 
detected  in  each  of  the  four  images  under  these  conditions  were  86  percent,  84  percent,  74 
percent,  and  67  percent,  with  a  combined  image  detection  rate  of  97  percent  (71  out  of  73). 


3.2.2  Quantitative  Accuracy 

To  assess  the  quantitative  accuracy  of  the  true  positive  2-D  roof  polygons,  each  was  compared 
with  its  corresponding  2-D  projected  ground  truth  polygon  in  terms  of  center-line  distance. 
Figure  5  plots  the  median  of  the  center-line  polygon  distances  between  detected  and  ground 
truth  2-D  polygons,  for  different  sensitivity  settings.  Polygons  detected  at  low  sensitivity 
levels  seem  to  be  sHghtly  more  accurate  than  those  detected  at  the  high  sensitivity  settings. 
This  is  so  because  the  detector  only  finds  cleairly  delineated  rooftop  boundciries  at  the  lower 
settings,  and  is  more  forgiving  in  its  grouping  criteria  at  the  higher  settings. 


Figure  5:  Building  detector  sensitivity  vs.  2-D  polygon  accuracy  in  pixels  (see  text). 

For  pairs  of  detected  and  ground  truth  polygons  having  the  same  number  of  vertices,  their 
set  of  inter- vertex  distamces  also  were  computed,  Eind  the  medians  of  those  measurements 
are  broken  down  by  image  in  Table  2.  The  average  distance  is  around  2.7  pixels.  Polygons 
detected  in  image  927  appear  to  be  a  little  more  accurate.  This  difference  may  or  may  not 
be  significant;  however,  image  927  was  taken  in  the  afternoon,  and  all  the  other  images  were 
taken  in  the  morning,  therefore  the  difference  in  sun  angle  may  be  the  cause. 
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Table  2:  Median  inter-vertex  distances  (in  pixels)  between  detected  polygon  vertices  and 
projected  ground  truth  roof  vertices,  for  four  images. 


711 

ItTOI 

IV  Distance 

2.75 

2.82 

QQQI 

3.3  Evaluation  of  3-D  Reconstruction 

The  second  major  subsystem  in  Ascender  takes  2-D  roof  hypotheses  detected  in  one  image 
and  reconstructs  3-D  rooftop  polygons  via  multi-image  line  segment  matching  and  trian¬ 
gulation.  Two  different  quantitative  evaluations  were  performed  on  this  subsystem.  The 
3-D  reconstruction  process  was  first  tested  in  isolation  from  the  2-D  detection  process  by 
using  2-D  projected  ground  truth  polygons  as  input.  This  initial  evaluation  was  done  to 
establish  a  baseline  measure  of  reconstruction  accuracy,  that  is,  to  see  how  accurate  the  final 
3-D  building  models  would  be  given  perfect  2-D  rooftop  extraction.  A  second  evciluation 
tested  end-to-end  system  performance  by  performing  3-D  reconstruction  using  the  set  of 
automatically  detected  2-D  image  polygons  from  the  previous  section. 


3.3.1  Baseline  Reconstruction  Accuracy 

The  baseline  measure  of  reconstruction  accuracy  was  performed  using  2-D  projected  ground 
truth  roof  polygons.  For  each  of  the  7  images  in  the  evaluation  test  set,  eiU  the  ground  truth 
2-D  polygons  from  that  image  were  matched  and  triangulated  using  the  other  6  images  as 
corroborating  views.  The  accuracy  of  each  reconstructed  roof  polygon  was  then  determined 
by  competing  it  with  its  3-D  ground  truth  counterpart  in  terms  of  center-line  distcince  and 
inter- vertex  distances.  Table  3  reports,  for  each  image,  the  median  of  the  center-line  polygon 
distances  between  reconstructed  and  ground  truth  polygons  for  that  image.  Also  reported 
axe  the  medians  of  the  planimetric  (horizonted)  and  cdtimetric  (verticed)  components  of  the 
inter- vertex  distances  between  reconstructed  and  ground  truth  polygon  vertices.  Horizontal 
placement  accuracy  was  about  0.3  meters,  which  is  in  accordance  with  the  resolution  of  the 
images. 


Table  3:  Baseline  accuracy  of  the  3-D  reconstruction  process.  Median  center-line  distances  as 
weU  as  inter-vertex  planimetric  and  altimetric  errors  are  shown  (in  meters)  for  four  images. 
See  text. 


711 

713 

525 

927 

CL  distance 

0.57 

0.46 

0.45 

0.53 

IV  planimetric 

0.29 

0.25 

0.33 

0.35 

IV  altimetric 

0.49 

0.42 

0.37 

0.43 

Another  suite  of  tests  was  performed  to  determine  how  the  number  of  views  edfects  the 
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accuracy  of  the  resulting  3-D  polygons.  These  tests  were  performed  using  image  711  as  the 
primary  image,  and  all  63  non-empty  subsets  of  the  other  6  views  as  additioned  views.  For 
each  subset  of  additional  views,  aU  2-D  projected  ground  truth  polygons  in  image  711  were 
matched  and  triangulated,  and  the  median  center-line  and  inter-vertex  distances  between 
reconstructed  and  ground  truth  3-D  polygons  were  recorded.  Figure  6  graphs  the  results, 
organized  by  number  of  images  used  (including  711),  ranging  from  only  two  views  up  to  six 
views.  The  distances  reported  under  label  “2”  are  averaged  over  the  6  possible  image  sets 


Figure  6:  Number  of  views  used  vs.  3-D  reconstruction  accuracy  in  meters.  See  text. 

containing  711  and  one  other  image,  distances  reported  under  “3”  Eire  averaged  over  all  15 
possible  image  sets  containing  711  and  two  other  images,  and  so  on.  There  is  a  noticeable 
improvement  in  accuracy  when  using  three  views  instead  of  two,  but  the  curves  flatten  out 
after  that,  and  there  is  little  improvement  in  accuracy  gained  by  taking  image  sets  leirger 
than  four. 

3.3.2  Actual  Reconstruction  Accuracy 

In  actual  practice.  Ascender  reconstruction  techniques  are  applied  to  the  2-D  image  polygons 
hypothesized  by  its  automated  building  detector.  Thus,  the  final  reconstruction  accuracy 
depends  not  only  on  the  number  and  geometry  of  the  additioncJ  views  used,  but  also  on  the  2- 
D  image  accuracy  of  the  hypothesized  roof  polygons.  The  typiceil  end-to-end  performance  of 
the  system  was  evcduated  by  taking  the  2-D  polygons  detected  in  Section  3.2.1  and  perforirdng 
matching  and  triangulation  using  the  other  six  views.  The  medicin  center-line  distances 
between  reconstructed  and  ground  truth  3-D  polygons  are  plotted  in  Figure  7  for  different 
sensitivity  settings  of  the  polygon  detector.  The  accuracy  is  sHghtly  better  when  using 
polygons  detected  at  the  lower  sensitivity  settings,  mirroring  the  better  accuracy  of  the  2-D 
polygons  at  those  levels  (compare  with  Figure  5). 
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Figure  7:  Building  detector  sensitivity  vs.  3-D  polygon  accuracy,  computed  as  the  median  of 
center-line  distances  between  reconstructed  3-D  polygons  and  ground  truth  roof  polygons. 


For  pairs  of  detected  and  ground  truth  polygons  having  the  same  nuniber  of  vertices, 
the  set  of  inter-vertex  planimetric  and  altimetric  errors  were  computed,  and  the  medians 
of  those  measurements  are  shown  in  Table  4,  broken  down  by  the  image  in  which  the  2-D 
polygons  feeding  the  reconstruction  process  were  hypothesized.  Unhke  the  baseline  error 
data  from  Table  3,  where  the  horizontal  accuracy  of  reconstructed  polygon  vertices  was 
better  than  their  vertical  accuracy,  here  the  situation  is  reversed,  strongly  suggesting  that 
the  planimetric  component  of  reconstructed  vertices  is  more  sensitive  to  inaccuracies  in  the 
2-D  polygon  detection  process  than  the  altimetric  component.  This  result  is  consistent  with 
previous  observations  that  the  corners  of  Ascender’s  reconstructed  building  models  cire  more 
accurate  in  height  thcin  in  horizontal  position  [4]. 


Table  4:  Median  planimetric  and  ailtimetric  errors  (in  meters)  between  reconstructed  3-D 
polygon  vertices  and  ground  truth  roof  vertices. 


711 

713 

525 

927 

IV  planimetric 

0.68 

0.73 

1.09 

0.89 

IV  altimetric 

0.51 

0.55 

0.90 

0.61 

3.4  Summary 

This  section  has  presented  preliminary  results  of  an  on-going  evaluation  of  the  Ascender 
system  using  an  unclassified  Ft.  Hood  data  set.  While  the  results  of  the  analysis  are 
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inevitably  tied  to  this  specific  data  set,  they  give  us  some  indication  of  how  the  system 
should  be  expected  to  perform  under  different  scenarios. 

Single-Image  Performance:  The  building  detection  rate  varies  roughly  linearly  with  the 
sensitivity  setting  of  the  polygon  detector.  At  the  high  sensitivity  level,  roughly  50  percent 
of  the  buildings  are  detected  in  each  image  using  Boldt  lines  extracted  at  a  medium  level 
of  sensitivity  (length  and  contrast  >  10),  and  about  75-80  percent  when  using  Boldt  hnes 
extracted  at  a  high  level  of  sensitivity  (length  and  contrast  >  5).  Although  line  segments 
and  corner  hypotheses  are  loccJized  to  subpixel  accuracy,  the  median  localization  error  of 
2-D  rooftop  polygon  vertices  is  around  2-3  pixels,  due  in  part  to  grouping  errors,  but  also 
in  part  to  errors  in  resected  camera  pose  (even  a  perfectly  segmented  polygon  boundairy 
will  not  align  with  the  projected  ground  truth  roof  if  the  camera  projection  parameters  are 
incorrect). 

Multiple-Image  Performance:  One  of  our  underlying  research  hypotheses  is  that  the  use 
of  multiple  images  increases  the  accuracy  and  reliabihty  of  the  building  extraction  process. 
Rooftops  that  are  missed  in  one  image  are  often  found  in  another,  so  combining  results 
from  multiple  images  typically  increases  the  building  detection  rate.  By  combining  detected 
polygons  from  four  images,  the  total  building  detection  rate  increased  to  81  percent  using 
medium-sensitivity  Boldt  hnes,  and  to  97  percent  using  high-sensitivity  ones.  Matching 
and  triangulation  to  produce  3-D  roof  polygons,  and  thus  the  full  building  wireframe  by 
extrusion,  can  perform  at  satisfactory  levels  of  accuracy  given  only  a  pciir  of  images,  but 
using  three  views  gives  noticeably  better  results.  After  four  images,  only  a  modest  increase 
in  3-D  accuracy  is  gained. 

Of  course,  any  of  these  general  statements  depends  critically  on  the  pcirticular  configu¬ 
ration  of  views  used.  Futher  testing  is  needed  to  elucidate  how  different  camera  positions 
and  orientations  affect  3-D  accuracy.  Nadir  views  appeair  to  produce  better  detection  rates 
than  obhques,  but  this  can  be  explaiined  by  large  differences  in  GSD  for  this  image  set  and 
may  not  be  characteristic  of  system  performance  in  general  -  agadn,  more  experimentation  is 
needed.  For  this  data  set,  3-D  building  corner  positions  were  recovered  to  well  within  a  me¬ 
ter  of  accuracy,  with  height  being  estimated  more  accurately  than  horizontal  position.  The 
accuracy  of  the  fined  reconstruction  depends  on  the  accuracy  of  the  detected  2-D  polygons, 
as  one  might  expect;  however  horizontal  accuracy  is  more  sensitive  to  2-D  polygon  errors 
than  vertical  accuracy.  How  3-D  accuracy  is  related  to  errors  in  resected  camera  pose  is  an 
issue  that  is  currently  under  analysis.  Also,  the  version  of  Ascender  tested  here  uses  only  a 
simple  control  strategy  for  detecting  flat-roofed  buildings,  more  complex  control  strategies 
under  development  may  yield  more  robust  results. 


4  3-D  Grouping  and  Data  Fusion 

The  building  reconstruction  strategies  used  in  the  Ascender  system  provide  an  elegant  solu¬ 
tion  to  extracting  flat-roofed  rectilinear  buildings,  but  extensions  Eire  necessEiry  in  order  to 
handle  other  common  building  types.  Examples  Eire  multi-level  flat  roofs  (or  single-level  flat 
roofs  containing  significant  substructures,  such  as  large  Eiir  conditioner  units),  peaked-roof 
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buildings,  juxtapositions  of  flat  and  peaked  roofs,  curved-roof  buildings  such  as  Quonset 
huts  or  hangars,  as  well  as  buildings  with  more  complex  roof  structures  containing  gables, 
slanted  dormers  or  spires. 

The  building  reconstruction  strategies  used  in  Ascender  cire  reasonably  effective,  but 
axe  tuned  to  extract  only  one  generic  building  class  with  single-level,  flat  roofs  bounded 
by  rectilinear  polygonal  shapes.  As  a  result,  polygonal  rooftop  detection  strategies  can 
easily  be  carried  out  entirely  in  2-D.  For  example,  verifying  that  a  hypothesized  2-D  image 
corner  could  be  the  projection  of  a  horizontal  roof  corner  in  the  3-D  scene  can  be  performed 
based  only  on  the  orientation  and  angle  of  the  2-D  corner  in  the  image,  together  with  the 
known  camera  pose  information.  Furthermore,  determination  of  the  probable  height  of  a 
hypothesized  rooftop  polygon  can  be  achieved  using  a  simple  one-dimensional  histogram- 
based  technique  where  the  disparities  of  potential  polygon  hne  segment  matches  within 
epipolar  search  regions  across  multiple  images  vote  directly  for  a  consensus  3-D  roof  height 
in  the  scene. 

To  develop  more  genercil  and  flexible  building  extraction  systems,  a  significant  research 
effort  is  underway  at  UMass  to  explore  alternative  detection  and  reconstruction  strategies 
that  combine  a  wider  range  of  2-D  and  3-D  information.  The  types  of  strategies  being 
considered  involve  generation  and  grouping  of  3-D  geometric  tokens  such  as  lines,  corners 
and  surfaces,  as  well  as  techniques  for  fusing  geometric  token  data  with  high-resolution 
digital  elevation  map  (DEM)  data.  By  verifying  geometric  consistencies  between  2-D  and 
3-D  tokens  associated  with  building  components,  larger  and  more  complex  3-D  structures 
are  being  organized  using  context-sensitive,  knowledge-based  strategies. 

A  more  comprehensive  description  of  the  new  types  of  extracted  geometric  features, 
and  methods  for  grouping/fusing  them,  is  given  in  [6].  Here,  we  briefly  outhne  two  of  the 
new  reconstruction  strategies  that  have  been  developed  as  direct,  incremental  extensions 
to  current  Ascender  technology:  computation  and  grouping  of  2.5-D  hne  segments,  and 
pzirameteric  DEM  surface  fitting  bounded  by  2-D  polygonal  roof  hypotheses. 

4.1  Extracting/ Grouping  2.5-D  Lines 

A  3-D  scene  hne  that  is  perpendicular  to  gravity  can  be  represented  as  a  2-D  image  hne 
segment  plus  its  associated  scene  elevation.  We  cah  this  representation  “2.5-D”  hne  seg¬ 
ments.  Sets  of  2.5-D  hnes  aire  computed  by  taking  2-D  Boldt  hne  segments  for  an  image  and 
augmenting  each  with  an  elevation  value  computed  via  multi-image  matching.  The  elevation 
estimate  for  each  hne  segment  is  formed  by  histogramming  the  set  of  elevations  imphed  by 
potential  corresponding  segments  within  epipolar-constrained  seeirch  regions  across  multiple 
images.  This  is  essentially  the  same  algorithm  that  is  used  in  Ascender  to  estimate  the  height 
of  flat  roof  polygons  in  the  scene,  except  it  is  apphed  to  an  individual  hne  segment  rather 
th3ua  to  the  set  of  edges  bounding  a  polygonad  roof  hypothesis. 

The  graph-based  perceptual  organization  algorithm  used  in  Ascender  for  organizing  hnes 
and  corners  into  closed  2-D  polygons  [7]  has  been  modified  to  handle  2.5-D  hnes.  An  addi¬ 
tional  set  of  3-D  consistency  checks  have  been  introduced  to  ensure  that  compatible  hnes  and 
corners  are  roughly  at  the  same  elevation  in  the  scene.  Individual  hne  heights  are  combined 
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and  propagated  into  grouped  corner,  chciin,  and  polygon  hypotheses.  The  results  are  closed 
2-D  polygons  with  associated  elevation  vcilues,  which  are  easily  converted  into  flat  3-D  roof 
polygons  using  the  known  camera  projection  equations.  The  benefit  of  the  2.5-D  approach  to 
roof  polygon  detection  is  that  image  line  segments  caused  by  shadows  and  ground-level  fea¬ 
tures  are  automatically  ignored,  and  there  is  less  chance  of  overgrouping  mrdtiple  roof  levels 
into  a  single  polygon  hypothesis  containing  edges  that  actually  occur  at  different  elevations 
in  the  scene  (Figure  8). 


Figure  8:  Using  2.5-D  lines  in  the  grouping  process  helps  disambiguate  multi-level  building 
roofs  (note  the  building  shadow,  which  shows  two  distinct  roof  levels).  The  Z-coordinates 
of  vertices  on  the  left  and  right  2.5-D  polygon  hypotheses  are  260.32  and  261.66  meters, 
respectively,  as  compeired  with  ground  truth  Z-values  of  260.65  and  262.31. 
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4.2  Surface  Fitting  to  DEM  Data 


A  second  building  detection  extension  that  has  proven  very  effective  is  to  directly  fuse  2- 
D  rooftop  polygon  hypotheses  with  high-resolution  DEM  data  in  order  to  estimate  various 
classes  of  parametrically  modeled  3-D  rooftop  surfaces.  The  DEM  data  is  produced  from  a 
pair  of  overlapping  images  by  hierarchical,  Eurea-based  correlation  matching  along  epipolair 
lines  [10].  In  order  to  extract  parametric  surfaces,  pixels  within  each  detected  roof  polygon 
are  backprojected  onto  the  DEM  data  to  deternaine  a  set  of  sampled  3-D  points.  Since  the 
DEM  data  are  potentially  noisy,  because  of  rooftop  clutter  and  mismatches,  robust  statistical 
estimation  techniques  are  used  to  do  the  fitting. 

Three  types  of  surface  fits  have  been  used  to  date:  planar,  peaked,  and  curved.  An 
important  issue  is  how  to  decide  which  parametric  model  to  use  for  fitting  the  DEM  data 
associated  with  a  given  rooftop  hypothesis.  In  some  cases  building  shadows  can  provide 
information  about  the  profile  of  the  rooftop.  An  alternative  approach  is  to  fit  a  number 
of  different  peirametric  classes  simultaneously,  and  simply  choose  the  one  that  best  fits  the 
data. 

Figure  9  shows  an  example  of  three  parametric  peaked-roof  surfaces  that  have  been  fit  to 
the  DEM  data  within  loccil  areas  defined  by  building  hypotheses  generated  by  Ascender.  It 
is  important  to  run  Ascender  on  nadir  views  in  this  case,  since  the  goal  is  to  make  the  system 
hypothesize  a  2-D  fiat-roofed  polygon  that  completely  surrounds  the  peaked  roof.  Encoding 
this  type  of  knowledge  about  how  and  when  to  apply  such  context-specific  building  extraction 
strategies  is  an  important  issue  to  consider  when  designing  an  operational  vision  system  [11]. 


Figure  9:  Three  pEurametric  peaked-roof  surfaces  that  have  been  fit  to  DEM  data  within 
building  boundaries  hypothesized  by  Ascender.  Compare  with  the  raw  DEM  building  data 
at  the  top  of  the  image. 
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5  Extracting  Surface  Structures  for  Visualization 


One  of  the  benefits  that  a  softcopy,  3-D  model-based  approach  to  site  analysis  has  over  the 
traditioncd  2-D  image-based  approach  is  that  the  image  aneJyst  can  generate  interactive, 
visual  displays  of  the  site  from  any  viewpoint.  Rapid  improvements  in  the  capability  of 
low-end  to  medium-end  graphics  haxdwcire  mzikes  the  use  of  intensity  mapping  an  attractive 
option  for  visualizing  geometric  site  models,  with  near  rccd-time  virtuzd  reality  displays 
achievable  on  high-end  workstations.  These  graphics  capabihties  have  resulted  in  a  demand 
for  algorithms  that  can  automatically  acquire  the  necessaury  surface  intensity  maps  from 
available  digital  photographs.  Under  the  RADIUS  project,  UMass  has  previously  developed 
routines  for  acquiring  image  intensity  maps  for  the  planar  facets  (walls  and  roof  surfaces)  of 
each  recovered  building  model  [3,  4].  Each  surface  intensity  map  is  a  composite  formed  from 
the  best  avadlable  views  of  that  building  face,  processed  to  remove  perspective  distortion 
caused  by  obliquity  and  visual  artifacts  caused  by  shadows  and  occlusions.  An  example  of 
a  building  from  RADIUS  Model  Board  1  rendered  using  automatically  acquired  intensity 
maps  is  shown  at  the  top  of  Figure  10. 

Although  intensity  mapping  enhances  the  virtual  reeilism  of  graphic  displays,  this  illusion 
of  realism  is  greatly  reduced  as  the  observer’s  viewpoint  comes  closer  to  the  rendered  object 
surface.  For  example,  straightforward  mapping  of  an  image  intensity  map  onto  a  flat  waU 
surface  looks  (and  is)  2-D,  unlike  the  surface  of  an  actueil  wall.  Windows  and  doors  on  a  real 
wall  surface  axe  typically  inset  into  the  wall  surface,  and  are  surrounded  by  framing  material 
that  extends  out  beyond  the  weJI  surface.  While  these  effects  are  barely  noticeable  from  a 
distance,  they  are  quite  pronounced  close  up.  A  further  problem  is  that  the  resolution  of  the 
surface  texture  map  is  Hmited  by  the  resolution  of  the  original  image.  As  you  move  closer 
to  the  surface,  more  detail  should  become  appeurent,  however,  the  graphics  surface  begins  to 
look  “pixelated,”  and  features  become  blurry.  In  particular,  some  of  the  window  features  on 
the  building  models  we  have  produced  cure  near  the  limits  of  the  available  image  resolution. 

What  is  needed  to  go  beyond  simple  intensity  mapping  is  expHcit  extraction  and  render¬ 
ing  of  detailed  surface  structures,  such  as  windows,  doors  and  roof  vents.  UMass’  current 
intensity  map  extraction  technology  provides  a  convenient  starting  point,  since  rectangulcir 
lattices  of  windows  or  roof  vents  can  be  searched  for  without  complication  from  the  effects 
of  perspective  distortion,  and  specific  surface  structiire  extraction  techniques  can  be  applied 
only  where  relevant,  i.e.  window  and  door  extraction  can  be  focused  on  WcJl  intensity  maps, 
while  roof  vent  computations  are  performed  only  on  roofs.  As  one  example,  a  generic  al¬ 
gorithm  has  been  developed  for  extracting  windows  and  doors  on  waU  surfaces,  based  on  a 
rectangular  region  growing  method  applied  at  local  intensity  minima  in  the  unwarped  inten¬ 
sity  map.  Extracted  window  and  door  hypotheses  cire  used  to  compose  a  refined  building 
model  that  expUcitly  represents  those  aurchitectural  details.  An  example  is  shown  in  Fig¬ 
ure  10.  The  windows  and  doors  have  been  rendered  as  dcirk  and  opaque,  but  since  they 
are  now  symbolicly  represented,  it  would  be  possible  to  render  the  windows  with  glass-like 
properties,  such  as  transpcirency  and  reflectivity. 

Future  work  on  extraction  of  surface  structures  wiU  concentrate  on  roof  features,  such 
as  pipes  and  vents,  that  appecir  as  “bumps”  on  an  otherwise  plancir  surface  area.  VisucJ 
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Figure  10:  Rendered  building  model  before  and  after  symbolic  window  extraction. 


cues  for  this  reconstruction  include  shadows  from  monocular  imagery,  as  well  as  disparity 
information  between  multiple  images.  This  is  a  challenging  problem  given  the  resolution  of 
available  aerial  imagery. 


6  Summary  and  On-Going  Work 

A  large  research  effort  is  underway  at  UMass  to  develop  capabihties  for  automated  site 
modeling  from  aerial  images.  The  Ascender  system  has  been  developed  to  extract  eind 
model  flat-roofed,  rectilinear  buildings  from  multiple  views.  Version  1.0  of  Ascender  has 
been  delivered  to  Lockheed-Martin  for  testing  on  classified  imagery  and  for  integration  into 
the  RADIUS  Testbed.  An  evaluation  of  Ascender  on  an  unclassified  data  set  of  Ft.  Hood 
has  been  performed  at  UMass.  The  results  suggest  that  the  system  performs  reasonably  well 
in  terms  of  detection  rate  and  accuracy,  and  that  performance  degrades  gracefully  when  the 
number  of  images  used  is  small.  Much  more  testing  will  be  needed  to  determine  how  the 
system  performs  under  various  weather  and  viewing  conditions,  in  order  to  formulate  a  set 
of  recommendations  as  to  how  and  when  to  use  the  system. 

Algorithms  and  strategies  for  extracting  other  common  building  classes  with  peaked, 
curved  and  multi-level  flat  roofs  eire  being  developed  and  tested  in  the  lab  for  eventual 
inclusion  into  Ascender.  Moving  beyond  a  single  control  strategy  for  detecting  a  single  class 
of  buildings  brings  to  the  forefront  issues  of  context-sensitive,  model  class  selection,  data 
fusion,  and  hypothesis  ctrbitration,  and  these  topics  axe  the  focus  of  our  current  research 
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efforts.  Research  on  symbolic  extraction  of  smeiU  surface  features,  such  as  windows  and  doors, 
also  is  being  performed.  Initicd  results  show  that  the  idea  is  feasible,  although  challenging, 
and  that  the  payoff  is  large  in  terms  of  realistic  scene  rendering. 
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